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Abstract — Density evolution is one of the most powerful ana- 
lytical tools for low-density parity-check (LDPC) codes and graph 
codes with message passing decoding algorithms. With channel 
symmetry as one of its fundamental assumptions, density evolu- 
tion (DE) has been widely and successfully applied to different 
channels, including binary erasure channels, binary symmetric 
channels, binary additive white Gaussian noise channels, etc. This 
paper generalizes density evolution for non-symmetric memoryless 
channels, which in turn broadens the applications to general 
memoryless channels, e.g. z-channels, composite white Gaussian 
noise channels, etc. The central theorem underpinning this 
generalization is the convergence to perfect projection for any 
fixed size supporting tree. A new iterative formula of the same 
complexity is then presented and the necessary theorems for 
the performance concentration theorems are developed. Several 
properties of the new density evolution method are explored, 
including stability results for general asymmetric memoryless 
channels. Simulations, code optimizations, and possible new 
applications suggested by this new density evolution method are 
also provided. This result is also used to prove the typicality 
of linear LDPC codes among the coset code ensemble when the 
minimum check node degree is sufficiently large. It is shown 
that the convergence to perfect projection is essential to the 
belief propagation algorithm even when only symmetric channels 
are considered. Hence the proof of the convergence to perfect 
projection serves also as a completion of the theory of classical 
density evolution for symmetric memoryless channels. 

Index Terms — Low-density parity-check (LDPC) codes, den- 
sity evolution, sum-product algorithm, asymmetric channels, z- 
channels, rank of random matrices. 



I. Introduction 

SINCE the advent of turbo codes [1] and the rediscovery 
of low-density parity-check (LDPC) codes [2], [3] in the 
mid 1990's, graph codes [4] have attracted significant attention 
because of their capacity-approaching error correcting capa- 
bility and the inherent low-complexity (0(n) or O(nlog(n)) 
where n is the codeword length) of message passing decoding 
algorithms [3]. The near-optimal performance of graph codes 
is generally based on pseudo-random interconnections and 
Pearl's belief propagation (BP) algorithm [5], which is a 

Manuscript received August 15, 2002; revised August 22, 2005. This 
work was supported in part by the National Science Foundation under 
Grants No. CCR-9980590 and CCR-0312413, the Army Research Laboratory 
Communications Technology Alliance under Contract No. DAAD19-01-2- 
0011, the Army Research Office under Contract No. DAAD19-00-1-0466, 
and the New Jersey Center for Pervasive Information Technologies. 

This paper was presented in part at the 3rd International Symposium on 
Turbo Codes & Related Topics, Brest, France, Sept. 1—5, 2003, and in part at 
the 39th Annual Conference on Information Sciences and Systems, Baltimore, 
USA, March 16-18, 2005. 

The authors are with the Department of Electrical Engineering, 
Princeton University, Princeton, NJ 08544. Email: {chihw, kulkarni, 
poor} @princeton.edu 



distributed message-passing algorithm efficiently computing a 
posteriori probabilities in cycle-free inference networks. Turbo 
codes can also be viewed as a variation of LDPC codes, as 
discussed in [3] and [6], 

Due to their simple arithmetic structure, completely parallel 
decoding algorithms, excellent error correcting capability [7], 
and acceptable encoding complexity [8], [9], LDPC codes 
have been widely and successfully applied to different chan- 
nels, including binary erasure channels (BECs) [10], [11], 
[12], binary symmetric channels (BSCs), binary-input addi- 
tive white Gaussian noise channels (BiAWGNCs) [3], [13], 
Rayleigh fading channels [14], Markov channels [15], par- 
tial response channels/intersymbol interference channels [16], 
[17], [18], [19], dirty paper coding [20], and bit-interleaved 
coded modulation [21]. Except for the finite-length analysis 
of LDPC codes over the BEC [22], the analysis of iterative 
message-passing decoding algorithms is asymptotic (when the 
block length tends to infinity) [13], [23]. Under the optimal 
maximum-likelihood (ML) decoding algorithm, both the finite- 
length analysis and the asymptotic analysis for LDPC codes 
and other ensembles of turbo-like codes become tractable and 
rely on the weight distribution of these ensembles (see e.g. 
[24], [25], and [26]). Various Gallager type bounds on ML 
decoders for different finite LDPC code ensembles have been 
established in [27]. 

In essence, the density evolution method proposed by 
Richardson et al. in [13] is an asymptotic analytical tool 
for LDPC codes. As the codeword length tends to infinity, 
the random codebook will be more and more likely to be 
cycle-free, under which condition the input messages of each 
node are independent. Therefore the probability density of 
messages passed can be computed iteratively. A performance 
concentration theorem and a cycle-free convergence theorem, 
providing the theoretical foundation of the density evolution 
method, are proved in [13]. The behavior of codes with block 
length > 10 4 is well predicted by this technique, and thus 
degree optimization for LDPC codes becomes tractable. Near 
optimal LDPC codes have been found in [7] and [23]. In [16] 
Kavcic et al. generalized the density evolution method to in- 
tersymbol interference channels, by introducing the ensemble 
of coset codes, i.e. the parity check equations are randomly 
selected as even or odd parities. Kavcic et al. also proved the 
corresponding fundamental theorems for the new coset code 
ensemble. 

Because of the symmetry of the BP algorithm and the 
symmetry of parity check constraints in LDPC codes, the 
decoding error probability will be independent of the trans- 
mitted codeword in the symmetric channel setting. Thus, 
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in [13], an all-zero transmitted codeword is assumed and 
the probability density of the messages passed depends only 
on the noise distribution. Nevertheless, in symbol-dependent 
asymmetric channels, which are the subject of this paper, 
the noise distribution is codeword-dependent, and thus some 
codewords are more noise-resistant than others. As a result, 
the all-zero codeword cannot be assumed. Instead of using 
a larger coset code ensemble as in [16], we circumvent 
this problem by averaging over all valid codewords, which 
is straightforward and has practical interpretations as the 
averaged error probability. Our results apply to all binary input, 
memoryless, symbol-dependent channels (e.g., z-channels, bi- 
nary asymmetric channels (BASCs), composite binary-input 
white Gaussian channels (composite BiAWGNCs), etc.) and 
can be generalized to LDPC codes over GF(g) or Z TO [28], 
[29], [30]. The theorem of convergence to perfect projec- 
tion is provided to justify this codeword-averaged approach 
in conjunction with the existing theorems. New results on 
monotonicity, symmetry, stability (a necessary and a sufficient 
condition), and convergence rate analysis of the codeword- 
averaged density evolution method are also provided. Our 
approach based on the linear 1 code ensemble will be linked to 
that of the coset code ensemble [16] by proving the typicality 
of linear 1 LDPC codes when the minimum check node degree 
is sufficiently large, which was first conjectured in [21]. All 
of the above generalizations are based on the convergence 
to perfect projection, which will serve also as a theoretical 
foundation for the belief propagation algorithms even when 
only symmetric channels are considered. 

This paper is organized as follows. The formulations of 
and background on channel models, LDPC code ensembles, 
the belief propagation algorithm, and density evolution, are 
provided in Section|ll| In Section[ni| an iterative formula is de- 
veloped for computing the evolution of the codeword-averaged 
probability density. In Section II VI we state the theorem of 
convergence to perfect projection, which justifies the iterative 
formula. A detailed proof will be given in APPENDIX U 
Monotonicity, symmetry, and stability theorems are stated 
and proved in Section [V] Section fVll consists of simulations 
and discussion of possible applications of our new density 
evolution method. Section IV11I proves the typicality of linear 
LDPC codes and revisits belief propagation for symmetric 
channels. Section IVIIII concludes the paper. 

II. Formulations 

A. Symbol-dependent Non-symmetric Channels 

The memoryless, symbol-dependent channels we consider 
here are modeled as follows. Let x and y denote a transmitted 
codeword vector and a received signal vector of codeword 

'LDPC codes are, by definition, linear codes since only even parity 
check equations are considered. Nonetheless, by taking both even and odd 
parity check equations into consideration, the extended LDPC "coset" code 
has been proven to have important practical and theoretical value in many 
applications [16]. To be explicit on whether only even parity-check equations 
are considered or an extended set of parity-check equations is involved, two 
terms, "linear LDPC codes" and "LDPC coset codes," will be used whenever 
a comparison is made, even though the adjective, linear, is redundant for 
traditional LDPC codes. 



length n, where Xi and yi are the z-th transmitted symbol and 
received signal, respectively, taking values in GF(2) and the 
reals, respectively. The channel is memoryless and is specified 
by the conditional probability density function / y | x (y| x ) = 
n™=i f(Vi\ x i)- Two common examples are as follows. 
• Example 1: [Binary Asymmetric Channels (BASC)] 



f(y\x) 



(1 - e )%) + e S(y - 1) if x = 
e 1( %) + (1 - e 1 )S(y - 1) if x = 1 



where eo, t\ are the crossover probabilities and 5(y) is the 
Dirac delta function. Note: if eo = 0, the above collapses 
to the z-channel. 

Example 2: [Composite BiAWGNCs] 



f(y\x) 



(y-3/VS) 2 
^2 



(j/ + 3/y/5) 2 ' 

> I if x = 



+ e 



(V+1/V5) 2 



if x = 1 



which corresponds to a bit-level sub-channel of the 
4 pulse amplitude modulation (4PAM) with Gray map- 
ping. 

B. Linear LDPC Code Ensembles 

The linear LDPC codes of length n are actually a special 
family of parity check codes, such that all codewords can 
be specified by the following even parity check equation in 
GF(2): 

Ax = 0, 

where A is an m x n sparse matrix in GF(2) with the number 
of non-zero elements linearly proportional to n. To facilitate 
our analysis, we use a code ensemble rather than a fixed code. 
Our linear code ensemble is generated by equiprobable edge 
permutations in a regular bipartite graph. 

As illustrated in Fig. the bipartite graph model consists 
of a bottom row of variable nodes (corresponding to codeword 
bits) and a top row of check nodes (corresponding to parity 
check equations). Suppose we have n variable nodes on the 
bottom and each of them has d v sockets. There are m := 2^ 
check nodes on the top and each of them has d c sockets. With 
these fixed (n + m) nodes, there are a total of (nd v )l possible 
configurations obtained by connecting these nd v — md c 
sockets on each side, assuming all sockets are distinguishable. 2 
The resulting graphs (multigraphs) will be regular and bipartite 
with degrees denoted by (d v , d c ), and can be mapped to parity 
check codes with the convention that the variable bit v is 
involved in parity check equation c if and only if the variable 
node v and the check node c are connected by an odd number 
of edges. We consider a regular code ensemble C n (d v ,d c ) 
putting equal probability on each of the possible configurations 
of the regular bipartite graphs described above. One realization 
of the codebook ensemble C 6 (2, 3) is shown in Fig. [2 For 
practical interest, we assume d c > 2. 

For each graph in C n (d v ,d c ), the parity check matrix A 
is an m x n matrix over GF(2), with Ajj = 1 if and only 

2 When assuming all variable/check node sockets are indistinguishable, the 
number of configurations can be upper bounded by &~rjws ■ 
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corresponding log likelihood ratio (LLR) is as follows: 




110 

V i o o o 
i =1 2 3 4 5 6 

Fig. 1. A realization of the code ensemble C 6 (2,3). 



if there is an odd number of edges between variable node i 
and check node j. Any valid codeword x satisfies the parity 
check equation Ax = 0. For future use, we let i and j denote 
the indices of the i-th variable node and the j-th check node. 
{ji , c }ce[i.d v ] denotes all check nodes connecting to variable 
node iq and similarly with {ij ,v}ve[i,d a ]' 

Besides the regular graph case, we can also consider irreg- 
ular code ensembles. Let A and p denote the finite order edge 
degree distribution polynomials 



\{x) = ^Afcx*" 1 
k 



where A^ or pk is the fraction of edges connecting to a 
degree k variable or check node, respectively. By assigning 
equal probability to each possible configuration of irregular 
bipartite graphs with degree distributions A and p (similarly 
to the regular case), we obtain the equiprobable, irregular, 
bipartite graph ensemble C n (X,p). For example: C"(3,6) = 
C n (x 2 ,x 5 ). 



C. Message Passing Algorithms & Belief Propagation 

The message passing decoding algorithm is a distributed 
algorithm such that each variable/check node has a processor, 
which takes all incoming messages from its neighbors as 
inputs, and outputs new messages back to all its neighbors. 
The algorithm can be completely specified by the variable 
and check node message maps, ^> v and \& c , which may or 
may not be stationary (i.e., the maps remain the same as time 
evolves) or uniform (i.e., node-independent). The message 
passing algorithm can be executed sequentially or in parallel 
depending on the order of the activations of different node 
processors. Henceforth, we consider only parallel message 
passing algorithms complying with the extrinsic principle 
(adapted from turbo codes), i.e. the new message sending to 
node i (or j) does not depend on the received message from 
the same node i (or j) but depends only on other received 
messages. 

A belief propagation algorithm is a message passing al- 
gorithm whose variable and check node message maps are 
derived from Pearl's inference network [5]. Under the cycle- 
free assumption on the inference network, belief propagation 
calculates the exact marginal a posteriori probabilities, and 
thus we obtain the optimal maximum a posteriori probability 
(MAP) decisions. Let mo denote the initial message from 
the variable nodes, and {rrik} denote the messages from its 
neighbors excluding that from the destination node. The entire 
belief propagation algorithm with messages representing the 



m 



In 



Pfoilsj = 0) 

P{Vi\xi = 1) 
-l 



*„(m ,mi 



*c(mi, 



3=0 



, 1 + n 7 tanh^ \ 
In T , -(2) 



We note that the belief propagation algorithm is based only 
on the cycle-free assumption 3 and is actually independent of 
the channel model. The initial message m depends only on 
the single-bit LLR function and can be calculated under non- 
symmetric f(yi\xi). As a result, the belief propagation algo- 
rithm remains the same for memoryless, symbol-dependent 
channels. 

• Example: For BASCs, 



m 



In if y . = o 

ln-^ 



l-ei 



if Vi 



1 



We assume that the belief propagation is executed in parallel 
and each iteration is a "round" in which all variable nodes send 
messages to all check nodes and then the check nodes send 
messages back. We use I to denote the number of iterations 
that have been executed. 

D. Density Evolution 

For a symmetric channel and any message-passing algo- 
rithm, the probability density of the transmitted messages in 
each iteration can be calculated iteratively with a concrete 
theoretical foundation [13]. The iterative formula and related 
theorems are termed "density evolution." Since the belief 
propagation algorithm performs extremely well under most 
circumstances and is of great importance, sometimes the term 
"density evolution" is reserved for the corresponding analytical 
method for belief propagation algorithms. 

III. Density Evolution: New Iterative Formula 

In what follows, we use the belief propagation algorithm as 
the illustrative example for our new iterative density evolution 
formula. 

With the assumption of channel symmetry and the inherent 
symmetry of the parity check equations in LDPC codes, 
the probability density of the messages in any symmetric 
message passing algorithm will be codeword independent, i.e., 
for different codewords, the densities of the messages passed 
differ only in parities, but all of them are of the same shape 
[Lemma 1, [13]]. 

In the symbol-dependent setting, symmetry of the channel 
may not hold. Even though the belief propagation mappings 
remain the same for asymmetric channels, the densities of the 
messages for different transmitted codewords are of different 
shapes and the density for the all-zero codeword cannot 
represent the behavior when other codewords are transmitted. 
To circumvent this problem, we average the density of the 

3 An implicit assumption will be revisited in Section IVlI-BI 
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■^(1,1) := {^i^s^ : 

x^sxe = 000,011, 101, 110} 



Fig. 2. Illustrations of M?j ~ and XL 1 >, / = 2. 



messages over all valid codewords. However, directly averag- 
ing over all codewords takes 2"~ m times more computations, 
which ruins the efficiency of the iterative formula for density 
evolution. Henceforth, we provide a new iterative formula 
for the codeword-averaged density evolution which increases 
the number of computations only by a constant factor; the 
corresponding theoretical foundations are provided in this 
section and in Section ITvl 



X'„ „. N denotes the set of all strings of length 



satisfying the 



"(id) 



KT21 

"(id) 



check node constraints in N~; 



-21 



x' denotes any element of Xi^ (the subscript is 
omitted if there is no ambiguity). The connection between 
X, the valid codewords, and XjL the tree-satisfying 
strings, will be clear in the following remark and in 
Definition [7] 

For any set of codewords (or strings) W, the average 
operator ( )w is defined as: 



(.9(x)) w = l^T E 

1 1 x£W 



(0 



With a slight abuse of notation for P,V., (#), we define 



yd 



(i) 



c4^ : = 



P (i) ,(x<) := 



P')(x) 
P,W,(x) 



{x£X:x|i=x} 



{xGXixL 



=x<} 



A. Preliminaries 

We consider the density of the message passed from variable 
node i to check node j. The probability density of this message 
is denoted by P/^ ^ (x) where the superscript I denotes the l- 
th iteration and the appended argument x denotes the actual 
transmitted codeword. For example, (0) is the density of 
the initial message m from variable node i to check node j 

( 2) 

assuming the all-zero codeword is transmitted. P,\ L (0) is the 
density from i to j in the second iteration, and so on. We also 
denote by Q|v^(x) the density of the message from check 
node j to variable node i in the l-th iteration. 

With the assumption that the corresponding graph is tree- 
like until depth 2(1 — 1), we define the following quantities. 
Fig. |2 illustrates these quantities for the code in Fig. [2 with 
i = j = 1 and I = 2. 

. M? 1 



denotes the tree-like subset of the graph 
(V,£) with root edge and depth 2(1 
the supporting tree. A formal definition is: Nf. is the 
subgraph induced by V?ljy where 



4 G = 
1), named as 

■21 

(id) 



Vfl tj) ~{v£V: d(v,i) = d(v,j) - 1 e [0,2(1 - 1)]}, (3) 

where d(v, i) is the shortest distance between node v and 
variable node i. In other words, ■N'ufi is th e depth 2(1— 1) 

tree spanned from edge (i,j). Let 



number of variable nodes in Mf l 



\r 21 

■"(id) 



denote the 



node i). 



N (id) 



(including variable 
denotes the number of check nodes in 



(check node j is excluded by definition). 
. X = {x 6 {0, 1}" : Ax = 0} denotes the set of all 

valid codewords, and the information source selects each 

codeword equiprobably from X. 
• xL and x| A r2! are the projections of codeword x 6 X 

on bit i and on the variable nodes in the supporting tree 

j\f? 1 N , respectively. 

4 The calligraphic V in G = (V, E) denotes the set of all vertices, including 
both variable nodes and check nodes. Namely, a node v £ V can be a 
variable/check node. 



Namely, P$\*(x) and P^^(x') denote the density av- 
eraged over all compatible codewords with projections 
being x and x , respectively. 
Remark: For any tree-satisfying string x' £ XL there 



may or may not be a codeword x with projection x 



Mr 



since the codeword x must satisfy all check nodes, but the 



string x needs to satisfy only 



check nodes outside j\f?) 

\ l >3 ) 



■"(id) 



constraints. Those 



may limit the projected space 



■\M?. 



row of Ax 



to a strict subset of XL^.y For example, the second 



in Fig. implies xq = 0. Therefore two of 
the four elements of X| : ^ in Fig. |2] are invalid/impossible 
projections of x 6 X on Mfl^y Thus X 



■\M r , 



is a proper 



subset of X 



To capture this phenomenon, we introduce the notion of a 



perfectly projected j\f?^y 

Definition 1 (Perfectly Projected j\f?l 



The 



tree is perfectly projected, if for any x' 6 X' 



{x e X : x|m-21 = x ; | 



X 



(id) 



supporting 

l 

(id)' 

(4) 



That is, if we choose x £ X equiprobably, x| *rii will appear 

<»,j) 

uniformly among all elements in x' £ XL jy Thus by looking 
only at the projections on Af? 1 .s, it is as if we are choosing x' 

from XL ~ equiprobably and there are only check 
node constraints and no others. 

The example in Figs. ^ an d ID is obviously not perfectly 
projected. 

Since the message emitted from node i to j in the l- 
th iteration depends only on the received signals of the 



supporting tree, y|^2i . 
actually depends only on the projection x 
entire codeword x. That is 

>(0 



the codeword-dependent P^'^(x) 



M? 



(0 

[id] 

not on the 



P, w ,(x) 



M? 1 .,)• 



(5) 
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An immediate implication of ' being a perfect projection 
and is 



4» ■= <^,w>, 



x(EX:x| ^=x} 

1 „ W 



|{xeX:x|,:=a;}| 



{xeX:x|i=a:} 



1 



|{xeX:x|i=a;}| 

• {xgX:xU =x',x'|j = a;} 

• E 

{x'ex' (ij)! x'hi} 
p/'\(x')\ . (6) 

Because of these two useful properties, (0 and (|6j, throughout 
this subsection we assume that A/S '~ is perfectly projected. 
The convergence of Nf jj) t o a perfect projection in probability 
is dealt with in Section Itvl We will have all the preliminaries 
necessary for deriving the new density evolution after intro- 
ducing the following self-explanatory lemma. 

Lemma 1 (Linearity of Density Transformation): For any 
random variable A with distribution Pa, if g ■ A i— » g(A) 
is measurable, then B = 5 (A) is a random variable with 
distribution Pg = T 9 (Pa) ■= Pa 5 1 - Furthermore, the 
density transformation T g is linear. I.e. if Pg = T 9 (Pa) and 
Qb = T g (Q A ), then aP B + (1 - a)Qs = T g (aP A + (1 - 
oi)Qa), Va G [0,1]. 



fi. yVew Formula 

In the Z-th iteration, the probability of sending an incorrect 
message (averaged over all possible codewords) from variable 
node io to check node jo is 



1 1 \{x 6 X : x| jo =0}' 7m = 



P, (i) . ,(x)(dm) 



+ 



E 



{x6X:xU =l} 



o(0 
('o.j'o) 



(x)(dm) 



m — — 00 



p(0 
(«o.Jo) 



(0)(dr, 



+ 



> C P . ,(l)(drn 



(7) 



Motivated by 0, we concentrate on finding an iterative for- 
mula for the density pair P$ ^ (0) and P® ^ (1). Through- 
out this section, we also assume JV? 1 > is tree-like (cycle- 
free) and perfectly projected. 

Let denote the indicator function. By an auxiliary 
function 7(771): 

(777 \ 
l{m<o},lncoth y J, (8) 

and letting the domain of the first coordinate of 7(7(7) be 
GF(2), Eq. (0 for ^ c can be written as 



* c (mi, • • • ,m da -i) 



E 

V «=1 



(9) 




mr 2 ' 



Fig. 3. Illustrations of various quantities used in Section ITTTI 



By Q> an d the independence among the input messages, 
the classical density evolution for belief propagation algo- 
rithms (Eq. (9) in [23]) is as follows. 



'(jin,e>*o) 



(x) = r- 1 



where ® denotes the convolution operator on probability 
density functions, which can be implemented efficiently using 
the Fourier transform. V := T 1 is the density transformation 
functional based on 7, defined in Lemma\l\ Fig. [5] illustrates 
many helpful quantities used in 1 11 Oi l, dl It . and throughout this 
section. 

By 0, ( I lOt . and the perfect projection assumption, we have 



V c=l 

Further simplification can be made such that 



(1) / P (0) fx) 



. C=l / / { x ! :x !| io=a: } 



(.7*0> c ' io ) 

\ c=1 / {x i :x 1 |i =£c} 



(±) p(0) 



(13) 



where (a) follows from (J6J, (b) follows from dl 2i . and (c) 
follows from the linearity of convolutions. The fact that the 
sub-trees generated by edges (ji 0jC , io) are completely disjoint 



IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 8, AUGUST 2005 



6 



implies that, by the perfect projection assumption on N?^ ^s, 
the distributions of strings on different sub-trees are indepen- 
dent. As a result, the average of the convolutional products 
(over these strings) equals the convolution of the averaged 
distributions, yielding (d). Finally (e) follows from the fact 
that the distributions of messages from different subtrees are 
identical according to the perfect projection assumption. 
To simplify (Of. \(x') ) , we need to define 

3 \^0io,i.*a) V 7{x':x'| <0 =x} 

some new notation. We use ji to represent j io i for simplicity. 



Denote 



by k 2 ^ 1 ',} 



the collection of all d r — 1 



subtrees rooted at ji), v € [1, d c — 1], and by X' 1 

the strings compatible to Af, 



■2(1-1) 

d c -l 



We can then consider 



X 1 (x) = Uxx, - ■ ,Xd c -x) ■ Xv j +x = °| 

containing the strings satisfying parity check constraint ji 
given x io — x, and 

X' _1 (a;i, • • • ,x dc -i) 



^iji.^-Ui) K.dc-i - Xd 
is the collection of the concatenations of substrings, in which 
the leading symbols of the substrings are (x\, ■ ■ ■ , Xd c -i)- All 
these quantities are illustrated in Fig. [3] 

Note the following two properties: (i) For any v, the 
message m v from variable ij 1 A , to check node ji depends only 
on xt 1 and (ii) With the leading symbols {x v } ve [i t d c -i] 
fixed and the perfect projection assumption, the projection 
on the strings \ x',7 1 . } are independent, and 

thus the averaged convolution of densities is equal to the 
convolution of the averaged densities. By repeatedly applying 
Lemma [7] and the above two properties, we have 



\'*(*0,l.«0) V '/{ X l; X l\ io=X } 



= r 



d c -l 



TIP, 



.(i-i) (• i-i ^ 

(b',„,Ji ,c) ( - X (i 31 , 1 ,Jl)V 



{x ! :x ! | i() =a;} 




(( p (*i,«\ ,.) ( x k-i .« a ) ) ) xi - 1 ( X i )) ) 



d c -l 



2 d c -2 yy r { p (i J ,v',ii ,a) [Xv 



(14) 



By (II 31 . 1141 . and dropping the subscripts during the density 
evolution, a new density evolution formula for P™(x), Vie = 
0, 1, is as follows. 



P (l) (x) = P (0) (x)(g) (Q^^x) 



®(d„-i) 



With the help of the linearity of distribution transformations 
and convolutions, the above can be further simplified and the 
desired efficient iterative formulae become: 

pW {x) = p(O)(x)$5(Q('- 1 )(x)) 0(<i "" 1) 



r 



p(!-i)(o) - p( i - 1 )(i) 



i(d c -l)\ 



The above formula can be easily generalized to the irregular 
code ensembles C n (X,p): 



P (i) (a:) = P (0) (a;) <g> A ^Q (i_1) (a;)) 



r -i/ , r i p^m+p^ii) 



P r 



(15) 



which has the same complexity as the classical density evolu- 
tion for symmetric channels. 

Remark: The above derivation relies heavily on the perfect 
projection assumption, which guarantees that uniformly aver- 
aging over all codewords is equivalent to uniformly averaging 
over the tree-satisfying strings. Since the tree-satisfying strings 
are well-structured and symmetric, we are on solid ground 
to move the average inside the classical density evolution 
formula. 

IV. Density Evolution: Fundamental Theorems 

As stated in Section [HI] the tree-like until depth 21 and the 
prefect projection assumptions are critical in our analysis. The 
use of codeword ensembles rather than fixed codes facilitates 
the analysis but its relationship to fixed codes needs to be 
explored. We restate two necessary theorems from [13], and 
give a novel perfect projection convergence theorem, which 
is essential to our new density evolution method. With these 
theorems, a concrete theoretical foundation will be established. 

Theorem 1 (Convergence to the Cycle-Free Case, [13]): 
Fix I, io, and jo. For any (d v ,d c ), there exists a constant 
a > 0, such that for all n € N, the code ensemble C n (d v ,d c ) 
satisfies 



P ( A/^ o . % is cycle-free j > 1 — a 



/ {(d„-l)(d c - l)} 
V n 



21 



where Af^ o is the support tree as defined by 0. 
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Theorem 2 (Convergence to Perfect Projection in Prob.): 
Fix I, iq, and jo. For any regular, bipartite, equiprobable 
graph ensemble C n (d v ,d c ), we have 

P (yV ( 2 / o jo) is perfectly projected) = 1 - O(n _01 ). 



Remark: The above two theorems focus only on the prop- 
erties of equiprobable regular bipartite graph ensembles, and 
are independent of the channel type of interest. 

Theorem 3 (Concentration to the Expectation, [13]): With 
fixed transmitted codeword x, let Z denote the number of 
wrong messages (those m's such that m(— l) x < 0). There 
exists a constant (3 > such that for any e > 0, over the code 
ensemble C n (d v , d c ) and the channel realizations y, we have 



Z-E{Z} 



nd„ 



(16) 



Furthermore, (3 is independent of / y | x (y|x), and thus is 
independent of x. 

Theorem can easily be generalized to symbol-dependent 
channels in the following corollary. 

Corollary 1: Over the equiprobable codebook X, the code 
ensemble 5 C n (d v ,d c ), and channel realizations y, (I16> still 
holds. 

Proof: Since the constant (3 in Theorem is indepen- 
dent of the transmitted codeword x, after averaging over the 
equiprobable codebook X, the inequality still holds. That is, 



E{Z} 



nd v 



> 



E{Z} 



nd v 



> 



< E 



x {2e-^«}=2e- 



0e z n 



Now we have all the prerequisite of proving the theoretical 
foundation of our codeword-averaged density evolution. 

Theorem 4 (Validity of Codeword-Averaged DE): 
Consider any regular, bipartite, equiprobable graph ensemble 
C n (d v ,d c ) with fixed I, iq, and Jq. pi (io,jo) is derived 
from @ and the codeword-averaged density evolution after 
I iterations. The probability over equiprobable codebook X, 
the code ensemble C n (d v ,d c ), and the channel realizations 
y, satisfies 



—j PeH^odo) 

nd„ 



> e = e 



2 °(") Ve>0. 



5 The only valid codeword for all code instances of the ensemble is the all- 
zero codeword. Therefore, a fixed bit string is in general not a valid codeword 
for most instances of the code ensemble, which hampers the averaging over 
the code ensemble. This, however, can be circumvented by the following 
construction. We first use Gaussian elimination to index the codewords, 



-tnR 



for any code instance in the code ensemble. And we then fix the 



index instead of the codeword. The statements and the proof of Theorem l?l 
hold verbatim after this slight modification. 



Proof: We note that ^|- is bounded between and 1. 
By observing that 



nd v 



1{A/( 2 / j ) is cycle-free and perfectly projected} 



nd v 



< I — 1 j l{7V( 2 / J0 ) is cycle-free and perfectly projected} 

+ 1, 

and using Theorems [7] and |3 we have lirrin^oo E | j = 

Pe (ioiio)- Then by Corollary\I\ the proof is complete. ■ 
The proof of 77zeorera |2] will be included in Appendix|I] 

V. MONOTONICITY, SYMMETRY, & STABILITY 

In this section, we prove the monotonicity, symmetry, 
and stability of our codeword-averaged density evolution 
method on belief propagation algorithms. Since the codeword- 
averaged density evolution reduces to the traditional one when 
the channel of interest is symmetric, the following theorems 
also reduce to those (in [23] and [13]) for symmetric channels. 

A. Monotonicity 

Proposition 1 (Monotonicity with Respect to I): Let 
denote the bit error probability of the codeword-averaged 
density evolution defined in |7}. Then p { e +1) < p { J\ for all 
I € N. 

Proof: We first note that the codeword-averaged approach 
can be viewed as concatenating a bit-to-sequence random 
mapper with the observation channels, and the larger the tree- 
structure is, the more observation/information the decision 
maker has. Since the BP decoder is the optimal MAP de- 
coder for the tree structure of interest, the larger the tree is, 
the smaller the error probability will be. The proof is thus 
complete. ■ 

Proposition 2 (Monotonicity w.r.t. Degraded Channels): 
Let f(y\x) and g(y\x) denote two different channel models, 
such that g(y\x) is degraded with respect to (w.r.t.) f(y\x). 

The corresponding decoding error probabilities, p"\ and Pe\, 

(i) 



are defined in @. Then for any fixed I, we have p e ^ < p e 

Proof: By taking the same point of view that the 
codeword-averaged approach is a concatenation of a bit-to- 
sequence random mapper with the observation channels, this 
theorem can be easily proved by the channel degradation 
argument. ■ 

B. Symmetry 

We will now show that even though the evolved density 
is derived from non-symmetric channels, there are still some 
symmetry properties inherent in the symmetric structure of 
belief propagation algorithms. We define the symmetric dis- 
tribution pair as follows. 

Definition 2 (Symmetric Distribution Pairs): Two 
probability measures P and Q are a symmetric pair if 
for any integrable function h, we have 

h(m)dP(m) = J e- rn h(~m)dQ(m). 
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A distribution P s is self-symmetric if (P s , P s ) is a symmetric 
pair. 

Proposition 3: Let I(m) := — m be a parity reversing func- 
tion, and let pW(0) and P"'(l) denote the resulting density 
functions from the codeword-averaged density evolution. Then 
PW(0) and PW(1) o I^ 1 are a symmetric pair for all I g N. 
Remark: In the symmetric channel case, P«(0) and PW(1) 
differ only in parity (Lemma 1, [13]). Thus, P")(0) = 
P^(l) o J -1 is self-symmetric [Theorem 3 in [23]]. 

Proof: We note that by the equiprobable codeword 
distribution and the perfect projection assumption, PW (0) and 
pO(l) act on the random variable m, given by 

m:=In ^ = Q|y;) =1 Pjy'l^Q) 

P(x=l|y*) P{yl\x=l)' 

where y l is the received signal on the subset Af 21 and P 
is the distribution over channel realizations and equiprobable 
codewords. Then by a change of measure, 



h(m)P w (0)(dm) 

P(y l \x = 0) 



E *=°^ |ln P(y<|x = l) 



P(y l \x = l) V P(y z k 
e m h{m)P®(l)(dm). 



(17) 



This completes the proof. 
Corollary 2: 



(PW) := 



p(0(0) + p(0(i) o /-i 



is self-symmetric for all I, i.e. ((PW), (P^)) is a symmetric 
pair. 



C. Stability 



Rather than looking only at the error probability pe of the 
evolved densities PW(0) and P (i) (l), we also focus on its 
Chernoff bound, 



CBP (l) {x) := 



l P (l) {x){dm). 



By letting h(m) — e '? and by d!7i . we have CPPW(O) 
CPPW(l). The averaged (CPP«) then becomes 



(CPP 



(0\ 



i ((7PpW(0) + C*PP W (1) 
CBP {1) (0) = CBP (l) {\) 

^{P {l) ){dm). (18) 



We state three properties which can easily be derived from the 
self-symmetry of (P^). Proofs can be found in [31], [23], 
and [30]. 

. (CBP®) =min s J e- s - m (pV>)(dm). 
• The density of e~ m ^ 2 (P^)(dm) is symmetric with re- 
spect to m = 0. 



. 2p { e l) < (CPPW) < 2\jp ( £\l-p { e ) ). This justifies the 
use of (CBPW) as our performance measure. 

Thus, we consider (CBP^), the Chernoff bound of p2 . 
With the regularity assumption that J R e sm {P^}(dm) < oo 
for all s in some neighborhood of zero, we state the necessary 
and sufficient stability conditions as follows. 

Theorem 5 (Sufficient Stability Condition): Let 
r := (CBPW) = / R e- m / 2 (p(°))(dm). Suppose 
\-2p'(l)r < 1, and let e* be the smallest strictly positive root 
of the following equation. 

A(p'(l)e)r = e. 

If for some l , (CPP (/f,) ) < e*, then 



(CBPW) = 



0((X 2 p'(l)ry 



O 



-0((k x -l)') 



if A 2 > 
if A 2 = 



and 



o (i) > 

->oo Pe ^ U. 



where k\ — min{£; : A& > 0}. In both cases: A 2 
A 2 >0, Um/^oo (CPP« ) = 0. 

Corollary 3: For any noise distribution f(y\x) with Bhat- 
tacharyya noise parameter r :— (CBP^), if there is no 
€ e (0, r) such that 

A(p'(l)e)r = e, 

then C(A, p) will have arbitrarily small bit error rate as n tends 
to infinity. The corresponding r can serve as an inner bound of 
the achievable region for general non-symmetric memoryless 
channels. Further discussion of finite dimensional bounds on 
the achievable region can be found in [30]. 

Theorem 6 (Necessary Stability Condition): Let 
r := (CBP^). If A 2 p'(l)r > 1, then lim/_ 

• Remark 1: (CBP^) is the Bhattacharyya noise pa- 
rameter and is related to the cutoff rate i?o by Rq = 
1 - log 2 ( 1 + (C7PP< 0) > ) . Further discussion of (CBP<W ) 
for turbo-like and LDPC codes can be found in [25], [31], 
[30]. 

• Remark 2: The stability results are first stated in [23] 
without the convergence rate statement and the stability 
region e* . Since we focus on general asymmetric channels 
(with symmetric channels as a special case), our conver- 
gence rate and stability region e* results also apply to the 
symmetric channel case. Benefitting from considering its 
Chernoff version, we will provide a simple proof, which 
did not appear in [23]. 

• Remark 3: e* can be used as a stopping criterion for the 
iterations of the density evolution. Moreover, e* is lower 



bounded by x^^0(r> 
efficient substitute for e*. 



which is a computationally 



Proof of Theorem [3J We define the Chernoff bound 
of the density of the messages emitting from check nodes, 
CBQW(x), in a fashion similar to CPPW(x): 



CBQ^(x) := 



"-Q^ l \x){dm) 
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First consider the case in which d, 
* c (mi,m2) = In ' 



1 



3. We then have 



tanh T -f- tanh ?f 
tanh ^ tanh ^ 



In- 



To simplify the analysis, we assume the all-zero codeword 
is transmitted and then generalize the results to non-zero 
codewords. Suppose the distributions of m\ and m 2 are 
P^O) and P 2 (Z) (0), respectively. The CBQ^(0) becomes 



CPQ (i) (0) 



— P^{0)(d mi ) x P 2 W (0)(dm 2 ) 



gmi _|_ gm 2 



!iii(iiii+ ^\0)(dm 1 )xP^(0)(dm 2 ) 
Ve™i + e m ^p[ l) {0){dm 1 ) x P 2 (0 (0)(dm 2 ) 
Ve^"-)- Ve^P^^^mi) x P 2 (0 (0)(dro 2 ) 



>(0, 



< 



< 



(0 



(19) 



where the last inequality follows from the fact that Va, [3 > 
0, \/ol + f3 < y/a + \f]3. Since any check node with d c > 3 
can be viewed as the concatenation of many check nodes with 
d c = 3, by induction and by assuming the all-zero codeword 
is transmitted, we have 



CPQ (z) (0) < (4 - l)CBPW(O). 



(20) 



Since as in (II 81 . the averaging 

over all possible codewords does not change ( 1201 . By further 
incorporating the check node degree polynomial p, we have 

Vxe {0,1}, CBQ {l \x) < ^p fe (fc- 1) (CPP (( 



= p'(l)(CBP«). 

By ( 1151 and the fact that the moment generating function 
of the convolution equals the product of individual moment 
generating functions, we have 

C7PP (z+1) (x) = CBP^(x)J2 X k (CBQ^ix)^ 1 

k 

< CBpW(x)\(p'(l)(CBpM) 
which is equivalent to 

(CPP (i+1) ) < (CPP (0) )A( i o'(l)(CPP (z) )) . (21) 

The sufficient stability theorem follows immediately from i2l\ . 
the iterative upper bound formula. ■ 
Remark: i2\i is a one-dimensional iterative bound for general 
asymmetric memoryless channels. In [30], this iterative upper 
bound will be further strengthened to: 



(CBP 



< 



(C*PP (0) )A (l - p (l - (C7PP (i) >)) 



which is tight for BECs and holds for asymmetric channels as 
well. 



Proof of Theorem]^ We prove this result by the erasure 
decomposition technique used in [23]. 

The erasure decomposition lemma in [23] states that, for any 
lo > 0, and any symmetric channel / with log likelihood ratio 
distribution p( /o ', there exists a BEC g with log likelihood 
ratio distribution B^ l °> such that / is physically degraded with 
respect to g. Furthermore, p( io ) is of the following form: 

P>> = 2eS + (1 - 2e)<y oo , 

for all e < pi°\ where 8 X is the Dirac-delta measure centered 
at x. It can be easily shown that this erasure decomposition 
lemma holds even when / corresponds to a non-symmetric 
channel with LLR distributions {p('o) (x)} x —o t i and p£°^ 
computed from @. 

We can then assign p('°)(0) := B^ and B^ lo \l) := 
£}0°) o I^ 1 to distinguish the distributions for different trans- 
mitted symbols x. 

Suppose rA 2 //(l) > 1 and lim;^ 00 pe' ) = 0. Then for any 
e > 0, 31q > 0, such that pi ^ < e. For simplicity, we assume 
Pe °^ = e. The physically better BEC is described as above. 
If during the iteration procedure d!5l >. we replace the density 
P( l °\x) with B^ lo \x), then the resulting density will be 



2c 



(A 2P '(1)) A/ P(°)(0)®((P(°))) 
+ (l-2e(A 2/ /(l)) A ') ( 5 oo +0(e 2 ) 



®(Al-l) 



2e 



(A 2 p'(l)) A 'P^(l)®((pW)o/- 1 )' 
+ (l-2e(A 2/ /(l)) Ai )<5_ oo +0(e 2 ), 

and the averaged error probability p^n is 

P ao+AO (0) + p Oo+AO (1)oJ - 



(lo+Al) 
Pe,B 



0(e 2 ) + 2e(X 2P '(l)) Al f d (<P<°>)) 



(dm) 

®AZ 

(22) 

By the fact that r = (CPP (0) ) is the Chernoff bound 
on J_ of(P( )), the regularity condition and the Chernoff 
theorem, for any e' > 0, there exists a large enough Al such 
that 

r0 



d <p(°>; 



(g>Al 



r\Al 



> (r-e 



With a small enough e', we have A 2 p'(l)(r — e') > 1. Thus 
with large enough Al, we have 

p (J„+Ai) > 0(£ 2 )+2e _ 

With small enough e or equivalently large enough Iq, we have 

p( io+Ai) > O(e a )+2e>e = j ,(io). 

However, by the monotonicity with respect to physically de- 
graded channels we have, p{ lo+A ^ > p^°g^ > Pe°\ which 
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contradicts the mono tonicity of py with respect to I, From 
the above reasoning, if r\2p'(l) > 1, then limj-xjopi > 0, 
which completes the proof. ■ 
Remark: From the sufficient stability condition, for those 
codes with A2 > 0, the convergence rate is exponential in 
I, i.e. BER = O ((rX 2 p'{l)) 1 ). However the number of bits 

involved in the Af 21 tree is O ({(d v - l)(d c - which 
is usually much faster than the reciprocal of the decrease rate 
of BER = O ((rA2/c'(l))'). As a result, we conjecture that 
the average performance of the code ensemble with A2 > 
will have bad block error probabilities. This is confirmed in 
Fig. |5Jb) and theoretically proved for the BEC in [32]. The 
converse is stated and proved in the following corollary. 

Corollary 4: Let E <^ j- denote the block error proba- 
bility of codeword length n after I iterations of the belief 
propagation algorithm, which is averaged over equiproba- 
ble codewords, channel realizations, and the code ensemble 
C n (X,p). If A2 = and l n satisfying In Inn = o(l n ) and 
l n = o(lnn), 

n lixnE{^}=0. 
Proof: This result can be proven directly by the cycle- 
free convergence theorem, the super-exponential bit conver- 
gence rate with respect to /, and the union bound. ■ 
A similar observation is also made and proved in [25], in 
which it is shown that the interleaving gain exponent of the 
block error rate is — J + 2, where J is the number of parallel 
constituent codes. The variable node degree d v is the number 
of parity check equations (parity check sub-codes) in which a 
variable bit participates. In a sense, an LDPC code is similar 
to d v parity check codes interleaved together. With d v = 2, 
good interleaving gain for the block error probability is not 
expected. 

VI. Simulations and Discussion 

It is worth noting that for non-symmetric channels, differ- 
ent codewords will have different error-resisting capabilities. 
In this section, we consider the averaged performance. We 
can obtain codeword-independent performance by adding a 
random number to the information message before encoding 
and then subtracting it after decoding. This approach, however, 
introduces higher computational cost. 

A. Simulation Settings 

With the help of the sufficient condition of the stability 
theorem (Theorem]^, we can use e* to set a stopping criterion 
for the iterations of the density evolution. We use the 8-bit 
quantized density evolution method with (—15, 15) being the 
domain of the LLR messages. We will determine the largest 
thresholds such that the evolved Chernoff bound (CBP">) 
hits e* within 100 iterations, i.e. (CBP (100) ) < e*. Better 
performance can be achieved by using more iterations, which, 
however, is of less practical interest. For example, the 500- 
iteration threshold of our best code for z-channels, 12B 
(described below), is 0.2785, compared to the 100-iteration 
threshold 0.2731. Five different code ensembles with rate 



1/2 are extensively simulated, including regular (3,6) codes, 
regular (4, 8) codes, 12A codes, 12B codes, and 12C codes, 
where 

• 12A: 12A is a rate-1/2 code ensemble found by Richard- 
son, et al. in [23], which is the best known degree 
distribution optimized for the symmetric BiAWGNC, 
having maximum degree constraints max<i„ < 12 and 
maxdc < 9. Its degree distributions are 

\{x) = 0.24426a; + 0.25907a; 2 + 0.01054a; 3 

+0.05510x 4 + 0.01455a; 7 + 0.01275a; 9 
+0.40373a; 11 , 

p(x) = 0.25475a; 6 + 0.73438a; 7 + 0.01087a; 8 . 

• 12B: 12B is a rate-1/2 code ensemble obtained by 
minimizing the hitting time of e* in z-channels, through 
hill-climbing and linear programming techniques. The 
maximum degree constraints are also maxc^ < 12 
and m&xd c < 9. The differences between 12A and 
12B are (1) 12B is optimized for the z-channels with 
our codeword-averaged density evolution, and 12A is 
optimized for the symmetric BiAWGNC. (2) 12B is op- 
timized with respect to the hitting time of e* (depending 
on (A, p)) rather than a fixed small threshold. The degree 
distributions of 12B are 

\{x) = 0.236809a; + 0.309590x 2 + 0.032789a; 3 

+0.007116a; 4 + 0.000001a; 5 + 0.413695a; 11 , 

p{x) = 0.000015a; 5 + 0.464854a; 6 + 0.502485a; 7 
+0.032647a; 8 . 

• 12C: 12C a rate-1/2 code ensemble similar to 12B, but 
with A2 being hard-wired to 0, which is suggested by the 
convergence rate in the sufficient stability condition. The 
degree distributions of 12C are 

X(x) = 0.861939a; 2 + 0.000818a; 3 + 0.000818a; 4 
+O.OOO8I83; 5 + 0.000818a; 6 + 0.000818a; 7 
+0.000218a; 8 + 0.077898a; 9 + 0.055843a; 10 
+0.000013X 11 , 

p(x) = 0.000814a; 4 + 0.560594a; 5 + 0.192771a; 6 
+0. 145207a; 7 + 0.100613a; 8 . 

Four different channels are considered, including the BEC, 
BSC, z-channel, and BiAWGNC. Z-channels are simulated 
by binary non-symmetric channels with very small eo ( e o = 
0.00001) and different values of e x . TABLE D summarizes 
the thresholds with precision 10 -4 . Thresholds are not only 
presented by their conventional channel parameters, but also 
by their Bhattacharyya noise parameters (Chernoff bounds). 
The column "stability" lists the maximum r := (CBP( >) 
such that rA2//(l) < 1, which is an upper bound on the 
(CBP( Q >) values of decodable channels. Further discussion 
of the relationship between (CBP^) and the decodable 
threshold can be found in [30]. 

From TABLE H] we observe that 12A outperforms 12B in 
Gaussian channels (for which 12A is optimized), but 12B 
is superior in z-channels for which it is optimized. The 
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Codes 


BEC 


BSC 


Z-channels 


BiAWGNC 


Stability 

(CBP) 


e 


(CBP) 


e 


{CBP) 


ei 


(CBP) 


a 


(CBP) 


(3,6) 


0.4294 


0.4294 


0.0837 


0.5539 


0.2305 


0.4828 


0.8790 


0.5235 


- 


(A %\ 


0.3834 


0.3834 


0.0764 


0.5313 


0.1997 


0.4497 


0.8360 


0.4890 




12A 


0.4682 


0.4682 


0.0937 


0.5828 


0.2710 


0.5233 


0.9384 


0.5668 


0.6060 


12B 


0.4753 


0.4753 


0.0939 


0.5834 


0.2731 


0.5253 


0.9362 


0.5653 


0.6247 


12C 


0.4354 


0.4354 


0.0862 


0.5613 


0.2356 


0.4881 


0.8878 


0.5303 




Sym. Info. Rate 


0.5000 


0.5000 


0.1100 


0.6258 


0.2932 


0.5415 


0.9787 


0.5933 




Capacity 


0.5000 


0.5000 


0.1100 


0.6258 


0.3035 


0.5509 


0.9787 


0.5933 





TABLE I 

Thresholds of different codes and channels, with precision 10~ 4 . 




£ 01 

Fig. 4. Asymptotic thresholds and the achievable regions of different codes 
in binary asymmetric channels. 



above behavior promises room for improvement with codes 
optimized for different channels, as was also shown in [14]. 

Fig-Sdemonstrates the asymptotic thresholds of these codes 
in binary asymmetric channels (BASCs) with the curves of 
12A and 12B being very close together. It is seen that 12B 
is slightly better when e ,ei — ► or e » ei. We notice 
that all the achievable regions of these codes are bounded 
by the symmetric mutual information rate (with a (1/2, 1/2) 
a priori distribution), which was also suggested in [16]. The 
difference between the symmetric mutual information rate and 
the capacity for non-symmetric channels is generally indistin- 
guishable from the practical point of view. For example, in 
[33], it was shown that the ratio between the symmetric mutual 
information rate and the capacity is lower bounded by ^-y^ w 
0.942. [34] further proved that the absolute difference is upper 
bounded by 0.011 bit/sym. Further discussion of capacity 
achieving codes with non-uniform a priori distributions can 
be found in [35] and [29]. 

Figs. |5ja) and |3b) consider several fixed finite codes 
in z-channels. We arbitrarily select graphs from the code 
ensemble with codeword lengths n =1,000 and n =10,000. 
Then, with these graphs (codes) fixed, we find the corre- 



sponding parity matrix A, use Gaussian elimination to find 
the generator matrix G, and transmit different codewords 
by encoding equiprobably selected information messages. 
Belief propagation decoding is used with 40 iterations for 
each codeword. 10,000 codewords are transmitted, and the 
overall bit/block error rates versus different e\ are plotted 
for different code ensembles and codeword lengths. Our new 
density evolution predicts the waterfall region quite accurately 
when the bit error rates are of primary interest. Though 
there are still gaps between the performance of finite codes 
and our asymptotic thresholds, the performance gaps between 
different finite length codes are very well predicted by the 
differences between their asymptotic thresholds. From the 
above observations and the underpinning theorems, we see 
that our new density evolution is a successful generalization 
of the traditional one from both practical and theoretical points 
of view. 

Fig. |5jb) exhibits the block error rate of the same 10,000- 
codeword simulation. The conjecture of bad block error proba- 
bilities for A2 > codes is confirmed. Besides the conjectured 
bad block error probabilities, Figs. [2a) and|5jb) also suggest 
that codes with A2 = will have a better error floor compared 
to those with A2 > 0, which can be partly explained by the 
comparatively slow convergence speed stated in the sufficient 
stability condition for A2 > codes. 12C is so far the best 
code we have for A2 = 0. However, its threshold is not as 
good as those of 12A and 12B. If good block error rate and 
low error floor are our major concerns, 12C (or other codes 
with A2 = 0) can still be competitive choices. Recent results 
in [36] shows that the error floor for codes with A2 > can be 
lowered by carefully arranging the degree two variable nodes 
in the corresponding graph while keeping a similar waterfall 
threshold. 

Figs. |6ja) and |6jb) illustrate the bit error rates versus 
different BASC settings with 2,000 transmitted codewords. 
Our computed density evolution threshold is again highly 
correlated with the performance of finite length codes for 
different asymmetric channel settings. 

We close this section by highlighting two applications of 
our results. 

1) Error Floor Analysis: "The error floor" is a characteristic 
of iterative decoding algorithms, which is of practical 
importance and may not be able to be determined solely 
by simulations. More analytical tools are needed to find 
error floors for corresponding codes. Our convergence 
rate statements in the sufficient stability condition may 
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(a) Bit error rates (b) Block error rates 

Fig. 5. Bit/block error rates versus ei with fixed eo = 0.00001. Computed thresholds for symmetric mutual information rate, (3,6), 12A, 12B, and 12C 
codes are 0.2932, 0.2305, 0.2710, 0.2730, and 0.2356, respectively. 40 iterations of belief propagation algorithms were performed. 10,000 codewords were 
used for the simulations. 



(3,6). n=1000 




E 10 £ 10 

(a) 12A & 12B: (b) 12C & regular (3,6) codes 



Fig. 6. Bit error rates versus ei for e = 0.01 and e = 0.7. The DE thresholds of (12A, 12B, 12C, (3,6)) are (0.2346,0.2332,0.2039,0.1981) for 
eo = 0.01 and (0.1202,0.1206,0.1036,0.0982) for eo = 0.07. 40 iterations of belief propagation algorithms were performed. 2,000 codewords were used 
for the simulations. 



shed some light on finding codes with low error floors. 
2) Capacity-Approaching Codes for General Non- 
standard Channels: Various very good codes (capacity- 
approaching) are known for standard channels, 
but very good codes for non-standard channels 
are not yet known. It is well known that one can 
construct capacity-approaching codes by incorporating 
symmetric-information-rate-approaching linear codes 
with the symbol mapper and demapper as an inner 
code [29], [35], [37], Understanding density evolution 
for general memoryless channels allows us to construct 
such symmetric-information-rate-approaching codes 
(for non-symmetric memoryless channels), and thus to 
find capacity-approaching codes after concatenating the 
inner symbol mapper and demapper. It is worth noting 
that intersymbol interference channels are dealt with by 



Kavcic et al. in [16] using the coset codes approach. 
It will be of great help if a unified framework for 
non-symmetric channels with memory can be found by 
incorporating both coset codes and codeword averaging 
approaches. 

VII. Further Implications of Generalized Density 
Evolution 

A. Typicality of Linear LDPC Codes 

One reason that non-symmetric channels are often over- 
looked is we can always transform a non-symmetric channel 
into a symmetric channel. Depending on different points of 
view, this channel-symmetrizing technique is termed the coset 
code argument [16] or dithering/the i.i.d. channel adapter 
[21], as illustrated in Figs. 0c) and Gib). Our generalized 
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lin. LDPC ENC 



lin. LDPC DEC 



(a) Linear Code Ensemble versus Non-symmetric Channels 



Symmetric Channel 



lin- LDPC ENC 



•0- 



lin. LDPC DEC 



(b) Linear Code Ensemble versus Symmetrized Channels 



LDPC Coset ENC 



LDPC Coset DEC 



lin. LDPC ENC 



Non-sym. CH. 



■e- 



lin. LDPC DEC 



(c) Coset Code Ensemble versus Non-symmetric Channels 

Fig. 7. Comparison of the approaches based on codeword averaging and the coset code ensemble. 



density evolution provides a simple way to directly analyze 
the linear LDPC code ensemble on non-symmetric channels, 
as in Fig. 0a). 

As shown in 77ieorera.s[5]and[6] the necessary and sufficient 
stability conditions of linear LDPC codes for non-symmetric 
channels, Fig. 0a), are identical to those of the coset code 
ensemble, Fig. Etc). Monte Carlo simulations based on finite- 
length codes (n — 10 4 ) [21] further show that the codeword- 
averaged performance in Fig. 0a) is nearly identical 6 to the 
performance of Fig. 0c) when the same encoder/decoder pah- 
is used. The above two facts suggest a close relationship 
between linear codes and the coset code ensemble, and it 
was conjectured in [21] that the scheme in Fig. 0a) should 
always have the same/similar performance as those illustrated 
by Fig. 0c). This short subsection is devoted to the question 
whether the systems in Figs. 0a) and 0c) are equivalent 
in terms of performance. In sum, the performance of the 
linear code ensemble is very unlikely to be identical to that 
of the coset code ensemble. However, when the minimum 
d c ,min '•= {k G N : pk > 0} is sufficiently large, we 
can prove that their performance discrepancy is theoretically 
indistinguishable. In practice, the discrepancy for d c . m i n > 6 
is < 0.05%. 

Let pW.(0) := P«(0) and P$.(l) := PW(1) o P -1 
denote the two evolved densities with aligned parity, and 
similarly define Q%} p .(0) ■= Q (l) {0) and Q^.(l) := Q®(l)o 
J -1 . Our main result in (1151 can be rewritten in the following 



Regular (3,4) Code on Z-channel w. p =(0.00001 , 0.4540) 



p e (symmetrized ch) 

- - - CBP (symmetrized Ch) 

P e (x=0) 

P.Cx-1) 

— <CBP> 

O passing threshold 



200 300 
Number of Iterations 



400 



Fig. 8. Density evolution for z-channels with the linear code ensemble and 
the coset code ensemble. 



form: 




P r 



(23) 



6 That is, it is within the precision of the Monte Carlo simulation. 



Let p e Unear denote the corresponding bit error probability 
of the linear codes after I iterations. For comparison, the 
traditional formula of density evolution for the symmetrized 
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TABLE II 

Threshold comparison p*^ of linear and coset LDPC codes on 

Z-CHANNELS 



(A,P) 




(x^,x b ) 


Linear 
Coset 


0.4540 
0.4527 


0.2305 
0.2304 




(A,P) 


(x z ,Q.5x z + 0.5x J ) 


(a^,0.5a: 4 + 0.5x b ) 


Linear 
Coset 


0.5888 
0.5908 


0.2689 
0.2690 



Namely, the asymptotic decodable thresholds of the linear 
and the coset code ensemble are arbitrarily close when the 
minimum check node degree d c , m i n is sufficiently large. 

Similar corollaries can be constructed for other channel 
models with different types of noise parameters, e.g., the a* 
in the composite BiAWGNC. A proof of Corollary]5\is found 
in AppendixITTTI 

Proof of Theorem {7\ Since the functionals in d23l and 
J24l > are continuous with respect to convergence in distribution, 
we need only to show that V/ £ N, 



channel (the coset code ensemble) is as follows: 



lim Qt P 1] (0) 



v 



A^oo 



lim Q«-V(l) 



A^oo 



P 



(0 



,(0) 



/ coset 



where P, 



(0) 



P 

: r 
E«=o,i^» 



coset 

1 



i^Qcoset^ 



P 



(2-1) 




P^W + P^W 



Similarly, let p : 



(0 

e, coset 



(24) 



denote 



Q&\o) + Q&\l) 



(25) 



coset 2 

the corresponding bit error probability. 

It is clear from the above formulae that when the channel 



of interest is symmetric, namely Pj!p.(0) 



(') 



Pal.{0) = Pa.p.(l) for all I £ N. However, for non- \ < Without loss of generality, 7 we may assume p A 



where = denotes convergence in distribution. Then by in- 
ductively applying this weak convergence argument, for any 



symmetric channels, since the variable node iteration involves 
convolution of several densities given the same x value, the 
difference between Qa.p 1 (0) and Qa. p }\l) will be amplified 
after each variable node iteration. Hence it is very unlikely 
that the decodable thresholds of linear codes and coset codes 
will be analytically identical, namely 



then bounded l , lim A ^oo(i 3(0 > = P® set in distribution for all 



r (0 
, llm Pe li 







lim picoset 



0. 



Fig. [8] demonstrates the traces of the evolved densities for the 
regular (3,4) code on z-channels. With the one-way crossover 
probability being 0.4540, the generalized density evolution for 
linear codes is able to converge within 179 iterations, while the 
coset code ensemble shows no convergence within 500 itera- 
tions. This demonstrates the possible performance discrepancy, 
though we do not have analytical results proving that the latter 
will not converge after further iterations. TABLE UTI compares 
the decodable thresholds such that the density evolution enters 
the stability region within 100 iterations. We notice that 
the larger d c ,min is, the smaller the discrepancy is. This 
phenomenon can be characterized by the following theorem. 

Theorem 7: Consider non-symmetric memoryless channels 
and a fixed pair of finite-degree polynomials A and p. The 
shifted version of the check node polynomial is denoted as 
PA — x A ■ p where A £ N. Let Pc \ et denote the evolved 
density from the coset code ensemble with degrees (A, pa), 
and (P(0) = \ Y. x =o,iPalXx) denote the averaged density 
from the linear code ensemble with degrees (A, pa)- For any 

l £ N, limA^oo(f W ) = Pclet in distribution for all I < l , 
with the convergence rate for each iteration being O (const A ) 

for some const < 1. 

Corollary 5 (The Typicality Results for Z-Channels): For 
any e > 0, there exists a A £ N such that 



sup I pi^o : lim p% near 



- sup •{ pi^o : lim P, 



(l) 

e, coset 



o 

< e 



and prove the weak convergence of distributions on the domain 

7(m) := 



m 

l{ m <o},mcotn — 
= (71,72) S GF(2) x R 

on which the check node iteration becomes 

Jout.A — Jin,! + 7m,2 "I h Jin, A- 



Let Pq denote the density of 7,- n (m) given that the distribu- 
tion of to is P^.p.^iO) and let P{ similarly correspond to 
Pa.p. 1 ^ (1)- Similarly let Q' A and Q[ A denote the output 
distributions on 7 on t,A when the check node degree is A+l. It 
is worth noting that any pair of Q' A and Q[ A can be mapped 

bijectively to the LLR distributions Qo.p.^(O) and Qo.p. 1 (1). 

Let $ P /(fc,r) := E P > {(-l)^^^ 72 } , Vfc £ N,r £ R, 
denote the Fourier transform of the density P'. Proving i25i 
is equivalent to showing that 

VfceN,reM, lim (fe,r)= Urn $q> Ak,r). 

However, to deal with the strictly growing average of the 
"limit distribution", we concentrate on the distribution of the 
normalized output 7 °"^ ,A instead. We then need to prove that 



Vfc £ N, r £ 



r t 
lim $ < (A;, — ) = lim $ G / (k, —). 



We first note that for all x = 0,1, Q' x A is the averaged 
distribution of 7 ou t.A when the inputs 7i n i are governed by 



Pa.pX x i) satisfying J2iLi Xi = x - F rom this observation, we 



-A 



7 We also need to assume that Wx, Pa.rj 1 \x)(m 



'- a. p. i^u" — 0) = so that 
lncoth 1^1 6 K + almost surely. This assumption can be relaxed by sepa- 
rately considering the event that mi n i = for some i 6 {1, ■ ■ • , d c — 1}. 
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can derive the following iterative equations: VA e N, 

2 

*Oi. A (*>-£) 



By induction, the difference thus becomes 



^(fe,i)-#p/(fc,i) 



= 2 ,^M1^M)) . (26) 



By Taylor's expansion and the BASC decomposition argument 
in [30], we can show that for all k € N, r 6 K, and for all 
possible Pq and P{, the quantity in (I26i converges to zero with 
convergence rate O (const A ) for some const < 1. A detailed 
derivation of the convergence rate is given in APPENDIX II VI 
Since the limit of the right-hand side of J26I is zero, the 
proof of weak convergence is complete. The exponentially fast 
convergence rate O (const A ) also justifies the fact that even 
for moderate d C)TO , n > 6, the performances of linear and coset 
LDPC codes are very close. ■ 
Remark 1: Consider any non-perfect message distribution, 
namely, 3xq such that Pi. p . 1 \xo) ^ <$oo- A persistent reader 

may notice that Vx, liniA^oo Qo.p. ( x ) = <So, namely, as 
A becomes large, all information is erased after passing 
a check node of large degree. If this convergence (erasure 
effect) occurs earlier than the convergence of Qa.p 1 (0) and 
Qa.p.^(l), the performances of linear and coset LDPC codes 
are "close" only when the code is "useless." 8 To quantify the 
convergence rate, we consider again the distributions on 7 
and their Fourier transforms. For the average of the output 
distributions Qa.p.(%), we have 

^J^+^Jk^) 

2 

/' $ QU_ I ( fe .s)+ $ 0' I , A _ 1 (^s)\ 



A 



(27) 



By Taylor's expansion and the BASC decomposition argu- 
ment, one can show that the limit of 1271 exists and the 
convergence rate is 0(A^ 1 ). (A detailed derivation is included 
in Appendix IIVD This convergence rate is much slower than 
the exponential rate O (const A ) in the proof of Theorem 

8 To be more precise, it corresponds to an extremely high-rate code and the 
information is erased after every check node iteration. 



Therefore, we do not need to worry about the case in which the 
required A for the convergence of Qo.p. 1 (0) and Qa^~\l) is 

excessively large so that Va- s GF(2), Q2j>}\x) ~ 5q. 

Remark 2: The intuition behind Theorem\J^v& that when the 
minimum d c is sufficiently large, the parity check constraint 
becomes relatively less stringent. Thus we can approximate 
the density of the outgoing messages for linear codes by 
assuming all bits involved in that particular parity check 
equation are "independently" distributed among {0, 1}, which 
leads to the formula for the coset code ensemble. On the 
other hand, extremely large d c is required for a check node 
iteration to completely destroy all information coming from 
the previous iteration. This explains the difference between 
their convergence rates: O (const A ) versus C(A _1 ). 

Fig. [9] illustrates the weak convergence predicted by The- 
orem 13 and depicts the convergence rates of Qa.p. 1 ^ (0) — ► 

So- 



Qi'-^Dand q^W+q^d) 



Our typicality result can be viewed as a complementing 
theorem of the concentration theorem in [Corollary 2.2 of 
[16]], where a constructive method of finding a typical coset- 
defining syndrome is not specified. Besides the theoretical 
importance, we are now on a solid basis to interchangeably 
use the linear LDPC codes and the LDPC coset codes when 
the check node degree is of moderate size. For instance, from 
the implementation point of view, the hardware uniformity 
of linear codes makes them a superior choice compared to 
any other coset code. We can then use the fast density 
evolution [38] plus the coset code ensemble to optimize the 
degree distribution for the linear LDPC codes. Or instead 
of simulating the codeword-averaged performance of linear 
LDPC codes, we can simulate the error probability of the 
all-zero codeword in the coset code ensemble, in which the 
efficient LDPC encoder [8] is not necessary. 

B. Revisiting the Belief Propagation Decoder 

Two known facts about the BP algorithm and the density 
evolution method are as follows. First, the BP algorithm 
is optimal for any cycle-free network, since it exploits the 
independence of the incoming LLR message. Second, by 
the cycle-free convergence theorem, the traditional density 
evolution is able to predict the behavior of the BP algorithm 
(designed for the tree structure) for Iq iterations, even when 
we are focusing on a Tanner graph of a finite-length LDPC 
code, which inevitably has many cycles. The performance of 
BP, predicted by density evolution, is outstanding so that we 
"implicitly assume" that the BP (designed for the tree struc- 
ture) is optimal for the first Iq iterations in terms of minimizing 
the codeword-averaged bit error rate (BER). Theoretically, to 
be able to minimize the codeword-averaged BER, the optimal 
decision rule inevitably must exploit the global knowledge 
about all possible codewords, which is, however, not available 
to the BP decoder. A question of interest is whether BP 
is indeed optimal for the first lo iterations? Namely, with 
only local knowledge about possible codewords, whether BP 
has the same performance as the optimal detector with the 
global information about the entire codebook and unlimited 
computational power when we are only interested in the 
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The check node iteration, w. d.-6, one-way Q :0.2305 
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Fig. 9. Illustration of the weak convergence of Qa. p^iO) and Qo.i>. (!)• 
One can see that the convergence of Qa.p 1 ^ (0) and Qa.p 1 * 1 (1) is faster than 



the convergence of 



QL'.;. 1) (o)+Qi'.; 1) (i) 



and 



first Iq iterations? The answer is a straightforward corollary 
to Theorem |2] the convergence to perfect projection, which 
provides the missing link regarding the optimality of BP when 
only local observations (on the N 21 ) are available. 

Theorem 8 (Local Optimality of the BP Decoder): Fix 
i,lo <E N. For sufficiently large codeword length n, almost 
all instances in the random code ensemble have the property 
that the BP decoder for Xi after lo iterations, Xbp(Y 1 "), 
coincides with the optimal MAP bit detector Xmap,i 0^ 1 ")' 
where Z is a fixed integer. The MAP bit detector Xmap,i (') 
uses the same number of observations as in Xbp(-) but 
is able to exploit the global knowledge about the entire 
codebook. 

Proof: When the support tree -A^fj) is perfectly pro- 
jected, the local information about the tree-satisfying strings 
is equivalent to the global information about the entire code- 
book. Therefore, the extra information about the entire code- 
book does not benefit the decision maker, and Xbp(-) = 
Xmap.Io(')- Theorem [2] shows that .A/2°.\ converges to per- 
fect projection in probability, which in turn implies that for 
sufficiently large n, BP decoder is locally optimal for almost 
all instances of the code ensemble. ■ 



Note: Even when limiting ourselves to symmetric memory- 
less channels, this local optimality of BP can only be proved 9 
by the convergence to perfect projection. Theorem]8\can thus 
be viewed as a completion of the classical density evolution 
for symmetric memoryless channels. 

VIII. Conclusions 

In this paper, we have developed a codeword-averaged den- 
sity evolution, which allows analysis of general non-symmetric 
memoryless channels. An essential perfect projection conver- 
gence theorem has been proved by a constraint propagation 
argument and by analyzing the behavior of random matrices. 
With this perfect projection convergence theorem, the theoret- 
ical foundation of the codeword-averaged density evolution is 
well established. Most of the properties of symmetric density 
evolution have been generalized and proved for the codeword- 
averaged density evolution on non-symmetric channels, in- 
cluding monotonicity, distribution symmetry, and stability. 
Besides a necessary stability condition, a sufficient stability 
condition has been stated with convergence rate arguments 
and a simple proof. 

The typicality of the linear LDPC code ensemble has been 
proved by the weak convergence (w.r.t. d c ) of the evolved 
densities in our codeword-averaged density evolution. Namely, 
when the check node degree is sufficiently large (e.g. d c > 6), 
the performance of the linear LDPC code ensemble is very 
close to (e.g. within 0.05%) the performance of the LDPC 
coset code ensemble. One important corollary to the perfect 
projection convergence theorem is the optimality of the belief 
propagation algorithms when the global information about 
the entire codebook is accessible. This can be viewed as a 
completion of the theory of classical density evolution for 
symmetric memoryless channels. 

Extensive simulations have been presented, the degree dis- 
tribution has been optimized for z-channels, and possible 
applications of our results have been discussed as well. From 
both practical and theoretical points of view, our codeword- 
averaged density evolution offers a straightforward and suc- 
cessful generalization of the traditional symmetric density 
evolution for general non-symmetric memoryless channels. 

Appendix I 
PROOF OF Theorem |2] 

We first introduce the following corollary: 

Corollary 6 (Cycle-free Convergence): For a sequence 

l n = | lnK.-l)+Md c -l) ' We haVe fOT ^ *0,i0, 

P Uf$ o n jo) is cycle-free) =1-0 frT 1/9 

Proof of Theorem^ In this proof, the subscript (io,Jo) 
will be omitted for notational simplicity. 

We notice that if for any l n > I, JV 21 " is perfectly 
projected, then so is Af 21 . Choose l n = § ln _ i) ( d e - 1 ) • 

9 The existing cycle-free convergence theorem along does not guarantee the 
local optimality of BP. 



IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 8, AUGUST 2005 



17 



By Corollary^ we have 



and 



P(AT 21 is perfectly projected) 

> P(J\f 21 " is perfectly projected) 

> P (A/" 2 '" is perfectly projected] J\f 2 ( l ™+ 1 ') is cycle-free) 
• P (7V 2 ('™ +1 ) is cycle-free) 

= P (TV 2 '" is perfectly projected) j\f 2 ( l ™+ 1 ) is cycle-free) 

\-o (V 1 / 9 

We then need only to show that 



P(A/" 2 '" is perfectly projected! 7V 2 ^™ +1 ) is cycle-free) 



(28) 



To prove j28l >. we take a deeper look at the incidence matrix 
(the parity check matrix) A, and use the (3, 5) regular code as 
our illustrative example. The proof is nonetheless general for 
all regular code ensembles. Conditioning on the event that the 
graph is cycle-free until depth 2-2, we can transform A into the 
form of d29t by row and column swaps. Using eg) to denote 
the Kronecker product (whether it represents convolution or 
Kronecker product should be clear from the context), d29t can 
be further expressed as follows. 



V 



Ilxl ® (1,1,1,1,1) 








15X5 ® (i) 


Iioxio ® (1,1,1,1) 








140x40 ® (i) 


A' 








A" 


(^i-91)x(n-4S) , 



) 



where I a xa denotes the a x a identity matrix, 

( sox(n 45) j j s ^ e j nc y ence ma t r ix of the 
A (3?-91)x(n-45) J 

equiprobable, bipartite subgraph, in which all (n — 45) 
variable nodes have degree d v , 80 check nodes have 
degree d c — 1, and — 91) check nodes have degree d c . 
Conditioning on a more general event that the graph is cycle 
free until depth 2(l n + 1) rather than 2 • 2, we will have 








v 


° | I (5-8'-,-l)x(5.8^-l) ® (I) 


I (lO Sln-l)x(10-8'n-i) < - 1 ' X ' l ~> 








I f6-8«»1XfB.8'n)«(y 


A' 








A" , 



where A; n corresponds to the incidence matrix of the cycle- 
free graph of depth 2l n . (.„) is the incidence matrix with 
rows (check nodes) in A' and A" having degree (d c — 1) and 
d c . For convenience, we denote the blocks in A as 



A = 



V" 
















U;„+i 








T ; „+i 


A' 








A" 



Then M 2ln is not perfectly projected if and only if there 
exists a non-zero row vector (r|0|0) such that 



(r|0|0) e RowSpace 



| T ; „ 


U ; „+i 










A' 








A" 



(30) 



r is not in the row space of A; n , 

or equivalently (r|0|0) is not in RowSpace(A; n |0|0). 

(31) 



Eqs. J3 II and J30I say that there exists a constraint r on 
the variable nodes of TV 2 '", which is not from the linear 
combination of those check node equations within J\f 2ln , but 
rather is imposed by the parity check equations outside J\f 2ln . 
It can be easily proved that if the matrix ( A „) is of full row 
rank, then no such r exists and JV 21 " is perfectly projected. 10 
Instead of proving ( A „) is of full rank, we take a different 
approach, which takes care of the constraint propagation. 

From J30t . we know that, for (r|0|0) to exist, there must 
exist a non-zero row vector (0|s|0) such that 



(0|s|0) S RowSpace 








A' 








A" 



(32) 



and 



s e RowSpace(U; n+ i) 

= RowSpace (/(i .8'»-i)x(io-8'»-i) ® i 1 , 1> h l )) ■ 

(33) 

From (I33i . the l's in s must be aligned such that four 
neighboring bits should have the same value; for example, 
s = (111100001111000000001111 • • -00001111). 

Any non-zero s satisfying 1321 is generated by Tj n +i. By 
applying the row symmetry in A', we see that the l's in 
any s are uniformly distributed among all these 5 • 8 ln bits. 
Therefore, conditioning on the event that there exists a not- 
all-one s satisfying Eq. 132\ . the probability that s satisfies 
Eq. d33t is 

P (s satisfies Eq. I33l |3s satisfies Eq. d32l > and is not 1) 
= P (the l's in s are aligned | 

3s satisfies Eq. (1321 and is not 1) 

= °; — • P (there are 4a ones in s) 

a=l \ ia ) 



1 



( 5 -f) {\((d v -i)(d c -i)y» 



(34) 



The last inequality follows from the assumption that s is 
neither all-zero nor all-one. The reason why we can exclude 
the case that s is all-one is that, if d v is odd, then there is 
an even number of l's in each column of Tj n . Since there 
is only one 1 in each column of Uj n +i, by J30t . an all-one 
s can only generate an all-zero r, which puts no constraints 



Af 



'21 „ 

(io Jo) 



If d v is even, by the same reasoning, an all-one 



"'Unfortunately, is not of full row rank. We can only show that 

with sufficiently large n, the row rank of ( A „) converges to the number 
of rows minus one by methods similar to those in [39]. A simple constraint 
propagation argument is still necessary for this approach. 
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; i i i i 



1111 



1111 



1111 



1111 



1111 



(29) 



iiii 



iiii 



iiii 



i i i i ; 



s will generate r of the form (00 ■ ■ • 11 ■ ■ • 1). Nevertheless, 
when d v is even, this specific type of r is in the row space of 
Ai n , which does not fulfill the requirement in ( I3U . From the 
above reasoning, we can exclude the all-one s. 

Let m r denote the number of rows of (v„) minus 
Rank( (.„)). The number of vectors s satisfying d32l > is upper 
bounded by 2 mr . By J34> . Proposition (which will be 
formally stated and proved later), and the union bound, we 
have 



P{N?i„j ) i s not perfectly projectedlTV^™.^ 1 is cycle-free) 
= P(3r satisfying 03 and fHl ) 

= P(3s, which satisfies i32\ and (I33L but is not all-one) 



< 



,1.1 



P(s satisfies Eq. d33l | 



3s satisfies Eq. (I32> and is not 1) 



•P (# of s is smaller than n? 
+ P (# of s is larger than n 1 



n x - x O{n- 
O(n- 01 ) : 



\(dc-2)\ + p(^r 

Vd c > 5. 



>n 11 ) 



(35) 



To prove the case d c < 5, we focus on the probability that the 
constraints propagate two levels rather than just one level, i.e. 
instead of J28l >. we focus on proving the following statement: 



P ^it,jo) is P erfectl y prqjected|A/^ ( o ^ o + ) 2) is cycle-free) 



0(ri 



-o.i\ 



Most of the analysis remains the same. The conditional 
probability in ( I34l > will be replaced by 

P((0|s|0) is able to propagate two levels|3s satisfying H2i ) 
= P((0|s|0) propagates the 2nd level| 

(0|s|0) propagates the 1st level, 3s satisfying 1321 1 
■P((0|s|0) propagates the 1st level|3s satisfying 1321 ) 

(-t" 1 ) rr ) 



a . b 



{ 4a 



a J \ Ab ) 

P(4a and 46 l's to propagate the 2nd and 1st levels) 



(a) 
< 



= o 



4-4 
(<%-2d. 



where the inequality marked (a) follows from an analysis 
of the minimum number of bits required for the constraint 
propagation similar to that for the single level case. By this 
stronger inequality and a bounding inequality similar to that 
in d35l >. we thus complete the proof of the case d c > 3 for all 
regular codes of practical interest. 

■ 

Note: This constraint propagation argument shows that the 
convergence to a perfectly projected tree is very strong. Even 
for codes with redundant check node equations (not of full row 
rank), it is probabilistically hard for the external constraints to 
propagate inside and impose on the variable nodes within Af 21 . 
This property is helpful when we consider belief propagation 
decoding on the alternative graph representation as in [40]. 

We close this section by stating the proposition regarding 
m r , the number of linearly dependent rows in ( A ») ■ The proof 
is left to AppendixUTI 

Proposition 4: Consider the semi-regular code ensemble 
Cm' m"(d v , d c ) generated by equiprobable edge permutation 
on a bipartite graph with n variable nodes of degree d v , and 
m' and m" check nodes with respective degrees (d c — 1) and 
d c . The corresponding parity check matrix is A = (. „) . With 
m r denoting the number of linearly dependent rows in A, i.e. 
m r := to' + to" — Rank(A), we have 

E{2™"} = 0(n), 
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which automatically implies P (2 nir > n 1+a ) = where the second inequality follows from Lemma and the 

P (m r > ( 1+Q! ) ln " ^ = 0(n~ a ) for any a > third inequality follows from the fact that the binary entropy 

Corollary 7?Ui[R denote the rate of a regular LDPC code function is u PP er bounded b ^ L 

ensemble C n {d v ,d c ), i.e., R = "~ Ra ° k ( A ) , whe re A is the B y dennm S 
corresponding parity check matrix. Then R converges to (n — 
m)/n in L , i.e. 



lim E 

n — >oo 



R 



0. 



f n (i,x) :=2 m -^ nH ^g(i,x), 
the summation in fl36i is upper bounded 11 by 



max inf f n (i,x) < inf max f n (i,x) < max f n (i, 1) 

ie[0,n]x>0 a;>0ig[0,n] i£[0,n] 



Proof: It is obvious that R > 1L -^ L - To show that 

limsupn^ E{R - = 0, we let m x = and rewrite By simple calculus, max, ie[0 n] / n (i, 1) is attained when i = 

R = n-Rank(A) = jj^rn. + r^. By Proposition [4\ m <i t he fact n/2. Since /„(n/2, 1) = 1, the summation in OSJl is bounded 

that ^ < 1, we have lim^oo Ej 1 ^} = 0. This completes by 1 for all n, and therefore 

the proof. ■ „ 

f 2 mr 1 

A stronger version of the convergence of R with respect to lim sup E < - > < \Jd v e Y ^ 

the block length n can be found in [391. n->oo { n ) 



The proof is complete. 



Appendix II 
PROOF OF Proposition 

We finish the proof of Proposition |?] by first stating the 

following lemma. APPENDIX III 

Lemma 2: For all < k € N, < i < n € N, we have PROOF OF Corollary^ 

( n ) r- i ,, ,x „ ,., , We prove one direction that 

Proo/: By Stirling's double inequality, p*^ lmear := sup <^ pi^ > : lim p\ , = 

> sup <^ pi^o > : lim p% oset = \ - e 



Pl^O, coset 



y2^ n («+|) e (-«+T^TT) < „] < y&Cn+iJeC-w+iis), 
we can prove 

e -§ < y^) < l, 

i 2 nH 2 (i/n) /^n The other direction that pi_x, )CO>rf > pI_x, )Hneaf . - e can be 

* easily obtained by symmetry, 

which immediately leads to the desired inequality. ■ By definition, for any e > 0, we can find a sufficiently large 

Proof of Proposition^ By the definition of m r , we have l Q < oo such that for a z-channel with one-way crossover 

2 m " = (total # of codewords)/2"-"\ where to = m' + to", probability Pl ^ := p\^ coset - e, P^"J e t is in the interior of 

Then th e stability region. We note that the stability region depends 

2" v \ 2 ^ E {# of codewords of weight i} only on the Bhattacharyya noise parameter of P^ l °J et , which 



is a continuous function with respect to convergence in distri- 
1=1 bution. Therefore, by Theorem [7| there exists a A € N such 

Using the enumerating function as in [41], [39] and define that (p('°)) is also in the stability region. By the definition 

d( x ) as of the stability region, we have lirii^oo P^\ inear = 0, which 

implies p*_ Unear > Pi->o- The proof is thus complete. 



, i .-^ic-i+^-xfc-i y 1 ( , L .,.„,)<' C+ (i_, , ■ 



g(i,x) := 

x Appendix IV 

the above quantity can be further upper bounded as follows. The Convergence Rates of gfl and 03 

■ 2 "i 



For ( I26> . we will consider the cases that k = and 

/„% k = 1 separately. By the BASC decomposition argument, 

2 infa;>o k) namely, all non-symmetric channels can be decomposed as 

n 2n-m 1 n2™ _m tne probabilistic combination of many BASCs, we can limit 

1=1 our attention to simple BASCs rather than general memoryless 

2 \/che 1/e S2 1 2~ {d "~ 1)nH2<l/7l) inix>0 g ^ i,x ^ non-symmetric channels. Suppose Pjfj^(O) and P2.p.(l) 

2™ m correspond to a BASC with crossover probabilities eo and e%. 



1=1 



o n-i . , , . Without loss of generality, we may assume en+ei < 1 because 

'=1 The range ol i is expanded here from a discrete integer set to a continuous 

(36) interval. 
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of the previous assumption that \/x G GF (2), Pa. p 1 \x)(m 
0) = 0. We then have 



$ P ,(fc,-) = (l-co)e* 



• In ■ 



+ (-l) fc e e^ ln ^^ 



and $ P >(k,-) = (l-ei)e 1 



• In ■ 



By Taylor's expansion, for k = 0, J26b becomes 

A 



$ p ,( ,£)-<I> P ,(0,£) 



which converges to zero with convergence rate O (0(A)~ A ). 
For A; = 1, we have 

A 



2 

-(l + e -ei)ln 



1 - e + ei 
1 - eo - ei 
1 + e - ei 
1 - eo - ei 



(9 



A 



which converges to zero with convergence rate 0(const A ), 
where const satisfies \e\ — eo I < const < 1. Since the 
convergence rate is determined by the slower of the above 
two, we have proven that (I26> converges to zero with rate 
0(const A ) for some const < 1. 

Consider (I27> . Since we assume that the input is not perfect, 
we have max(eo,ei) > 0. For k = 0, by Taylor's expansion, 
we have 

*n;(o,i) +*p.(o, %r 4 



+(1 + e - ei) In 



1 + e - ei 
1 - e - ei 



which converges to 

e i( 5 )((l- eo+ e 1 )ln^|2±|i+(l+e -e 1 )lni±n) 

with rate (A -1 ). For k = 1, we have 



■In ^-'Q+'i 



(1-eo-ei) 



*o-'i -4- e 



which converges to zero with rate O ((1 — e — e!) A ). Since 
the overall convergence rate is the slower of the above two, 
we have proven that the convergence rate is O (A -1 ). 
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